authz/loader: restart file watcher on transient symlink errors#146
Merged
authz/loader: restart file watcher on transient symlink errors#146
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
Koanf's file.Provider.Watch exits permanently when filepath.EvalSymlinks fails -- which happens every time Kubernetes updates a ConfigMap-mounted volume, since kubelet briefly removes the ..data symlink before creating the new one. The fix: when the watch callback receives an error, sleep 1s and restart the watch. Also reload the policy immediately after restart to pick up any updates missed while dead. This was causing the dataplane authz interceptor to permanently stop picking up policy changes after the first ConfigMap update, requiring a pod restart to recover.
Koanf's file.Provider.Watch exits when filepath.EvalSymlinks fails, which happens during Kubernetes ConfigMap updates (kubelet briefly removes the ..data symlink). Restart the watch in a loop with 1s backoff. Reload the policy after each restart to catch missed updates. Single file.Provider instance, initial watch is synchronous, restart loop runs in a background goroutine. Unwatch closes the stop channel.
7944dc2 to
4302671
Compare
simon0191
approved these changes
Mar 20, 2026
f53b4e1 to
2d29eb7
Compare
2d29eb7 to
10985cc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Restart the policy file watcher when koanf's fsnotify watcher exits due to transient symlink errors during Kubernetes ConfigMap updates.
Why
Koanf's
file.Provider.Watchcallsfilepath.EvalSymlinkson every filesystem event. When Kubernetes updates a ConfigMap-mounted volume, kubelet briefly removes the..datasymlink before creating the new one. During this window,EvalSymlinksfails and koanf's watcher goroutine exits permanently (break loop).This caused the dataplane authz interceptor to permanently stop picking up policy changes after the first ConfigMap update. The interceptor loaded the initial policy on startup but never saw subsequent updates, requiring a pod restart to recover. We discovered this debugging why newly created service accounts were not recognized by the authz layer despite being added to the policy ConfigMap.
Implementation details
When the watch callback receives an error (watcher died), sleep 1s and call
startWatch()again with a freshfile.Provider. After restarting, immediately reload the policy file to pick up any updates missed while the watcher was dead.The test simulates the exact Kubernetes ConfigMap update sequence: create new generation directory, remove
..datasymlink, brief pause, create new..datasymlink pointing to new generation. It verifies the watcher survives two consecutive swaps.